Mining SARS-CoV protease cleavage data using non-orthogonal decision trees: a novel method for decisive template selection
نویسنده
چکیده
MOTIVATION Although the outbreak of the severe acute respiratory syndrome (SARS) is currently over, it is expected that it will return to attack human beings. A critical challenge to scientists from various disciplines worldwide is to study the specificity of cleavage activity of SARS-related coronavirus (SARS-CoV) and use the knowledge obtained from the study for effective inhibitor design to fight the disease. The most commonly used inductive programming methods for knowledge discovery from data assume that the elements of input patterns are orthogonal to each other. Suppose a sub-sequence is denoted as P2-P1-P1'-P2', the conventional inductive programming method may result in a rule like 'if P1 = Q, then the sub-sequence is cleaved, otherwise non-cleaved'. If the site P1 is not orthogonal to the others (for instance, P2, P1' and P2'), the prediction power of these kind of rules may be limited. Therefore this study is aimed at developing a novel method for constructing non-orthogonal decision trees for mining protease data. RESULT Eighteen sequences of coronavirus polyprotein were downloaded from NCBI (http://www.ncbi.nlm.nih.gov). Among these sequences, 252 cleavage sites were experimentally determined. These sequences were scanned using a sliding window with size k to generate about 50,000 k-mer sub-sequences (for short, k-mers). The value of k varies from 4 to 12 with a gap of two. The bio-basis function proposed by Thomson et al. is used to transform the k-mers to a high-dimensional numerical space on which an inductive programming method is applied for the purpose of deriving a decision tree for decision-making. The process of this transform is referred to as a bio-mapping. The constructed decision trees select about 10 out of 50,000 k-mers. This small set of selected k-mers is regarded as a set of decisive templates. By doing so, non-orthogonal decision trees are constructed using the selected templates and the prediction accuracy is significantly improved.
منابع مشابه
Production of authentic SARS-CoV M(pro) with enhanced activity: application as a novel tag-cleavage endopeptidase for protein overproduction.
The viral proteases have proven to be the most selective and useful for removing the fusion tags in fusion protein expression systems. As a key enzyme in the viral life-cycle, the main protease (M(pro)) is most attractive for drug design targeting the SARS coronavirus (SARS-CoV), the etiological agent responsible for the outbreak of severe acute respiratory syndrome (SARS) in 2003. In this stud...
متن کاملCleavage of the SARS Coronavirus Spike Glycoprotein by Airway Proteases Enhances Virus Entry into Human Bronchial Epithelial Cells In Vitro
BACKGROUND Entry of enveloped viruses into host cells requires the activation of viral envelope glycoproteins through cleavage by either intracellular or extracellular proteases. In order to gain insight into the molecular basis of protease cleavage and its impact on the efficiency of viral entry, we investigated the susceptibility of a recombinant native full-length S-protein trimer (triSpike)...
متن کاملThe papain-like protease of severe acute respiratory syndrome coronavirus has deubiquitinating activity.
Replication of the genomic RNA of severe acute respiratory syndrome coronavirus (SARS-CoV) is mediated by replicase polyproteins that are processed by two viral proteases, papain-like protease (PLpro) and 3C-like protease (3CLpro). Previously, we showed that SARS-CoV PLpro processes the replicase polyprotein at three conserved cleavage sites. Here, we report the identification and characterizat...
متن کاملA New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملInvestigating the Mechanism of Action of SARS-CoV-2 Virus for Drug Designing: A Review
Coronavirus Disease 2019 (COVID-19) is a viral pneumonia emerged in December 2019 in Wuhan, China. Its cause is a new virus from the coronavirus family scientifically named Coronavirus Acute Respiratory Syndrome 2 (SARS-CoV-2). In this review study, articles published in English until March 23, 2020 on new coronavirus infection were reviewed. These articles are obtained by searching in PubMed, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 21 11 شماره
صفحات -
تاریخ انتشار 2005